Majority of countries had at most two launching facilities.
87% facilities performed no more than 200 launches.
Rockets were launched off of vessels for Yellow Sea and Barents Sea.
Top five countries/locations accounted for over 92% of global launches.
Global No. of Launches: 4630
In 2021 occurred the highest no. of launches - 157.
No. of launches was the highest (~ 100 per year) in 1966-1978, and 2016-2022. Lowest no. of launches per year was performed in 2001-2015 and it ranged from 37 (0.8% of global launches) to 53 (1.1% of global launches).
97% rockets launched \(\le\) 50 times.
No. of rockets: 371
No. of launches: 4630
Rocket Cosmos-3M (11K65M) was the most popular one with 446 launches (9.6% of total launches).
64% rockets operated for \(\le\) 5 years.
Soyuz U was operational for 43.8 years (from 1973-05-18 to 2017-02-22). Longest serving rocket still active has been Long March 2C (39.9 years).
The data for this study is located at Maven Analytics [https://app.mavenanalytics.io/datasets] “Space Missions” and contains 4,630 records regarding space missions around the world for the period 10/4/1957 - 07/29/2022.
I reached out to the staff at Maven Analytics and they said that the dataset can be freely used.
I performed some data credibility check by checking and confirming from other sources some randomly selected observations.
Thanks for looking at my work.
Comments on how it can be done better are always welcome.
---
title: "Space Missions 1957/04/10 - 2022/07/29"
# author: "MG"
date: "Modified: `r Sys.Date()`"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
social: menu
source_code: embed
---
<!-- to get pdf
always_allow_html: true
output:
pdf_document: default
-->
```{r setup, include=FALSE}
library(flexdashboard)
# to install flexdashboard
#install.packages("flexdashboard")
# flexdashboard intro
# https://pkgs.rstudio.com/flexdashboard/articles/flexdashboard.html
# for not outputting echo
knitr::opts_chunk$set(echo = FALSE, results='hide', message=FALSE, warning=FALSE)
```
```{r loadlib}
# the default timeframe of the dataframe
LAUNCH_PERIOD="10/4/1957 - 7/29/2022"
LAUNCH_PERIOD_DOT=paste0(LAUNCH_PERIOD, ".")
# most intense color
MOST_INTENSE_COLOR="#132B43" # dark blue
LEAST_INTENSE_COLOR="#56B1F7" # light blue
```
```{r}
# read the csv file
library(readr)
df <- read_csv('../mission-all-fields.csv')
library(dplyr)
# we will be modifying df so making a copy
df_facility <- df
# concatenate Facility, State, Area and skip NA values
df_facility$Place <- apply(df[,3:5], 1, function(x) paste(na.omit(x), collapse = ", "))
# how many NA is per column in df_facility
sapply(df_facility, function(x) sum(is.na(x)))
# ------------------
# for launches
# -----------------
# find total number of launches per place and their successes and failures
mission_status_df <- df_facility %>% group_by(Place) %>%
summarize(Total = n(),
Success = sum(MissionStatus == 1),
Failure = sum(MissionStatus == 2),
PartialFailure = sum(MissionStatus == 3),
PrelaunchFailure = sum(MissionStatus == 4),
JustFailure = Total - Success) %>%
arrange(Place)
# add abbreviation for diagrams
mission_status_df$PlaceAbbrv <- abbreviate(mission_status_df$Place, minlength = 45)
# have percentages ready for use
total_launches <- sum(mission_status_df$Total)
library(scales)
#' Compute the percentage
#' @param value Value to be presented as percentage
#' @param digi What precision, how many digits after decimal point
#' @return rounded value with digi digits after decimal point
percentage <- function(value, digi) {
round(value*100, digits = digi)
}
# total percent success and total percent of failures
#mission_status_df$SuccessPercent <- percent(mission_status_df$Success/total_launches, accuracy = 0.1)
#mission_status_df$FailurePercent <- percent(mission_status_df$JustFailure/total_launches, accuracy = 0.1)
mission_status_df$SuccessPercent <- percentage(mission_status_df$Success/total_launches, 1)
mission_status_df$FailurePercent <- percentage(mission_status_df$JustFailure/total_launches, 1)
mission_status_df$TotalPercent <- percentage(mission_status_df$Total/total_launches, 1)
mission_status_df$RelSuccessPerc <- percentage(mission_status_df$Success/mission_status_df$Total, 1)
mission_status_df$RelFailurePerc <- percentage(mission_status_df$JustFailure/mission_status_df$Total, 1)
# find out total number of successes, failures and others
totals_df <- colSums(mission_status_df[,2:6])
```
Facilities {data-orientation=columns}
====================================================
Column {data-width=150}
-------------------------------------------------------
### Global No. of Launches
```{r, results='markup'}
valueBox(total_launches, icon="fa-rocket")
```
### No. of Launching Countries/Locations
```{r, results='markup'}
valueBox(nrow(df_facility %>% distinct(Area)), icon="glyphicon-map-marker")
```
### No. of Launching Facilities
```{r, results='markup'}
valueBox(nrow(df_facility %>% distinct(Place)), icon="fa-building")
```
Column {data-width=400}
--------------------------------------
### Histogram of no. of facilities per country. {.no-padding}
```{r, fig.width=8}
# get the number of facilities per country
countries_facilities_df <- df_facility %>% select(Area, Facility) %>% distinct(Facility, Area) %>% group_by(Area) %>% summarize(FacilityCount = n()) %>% arrange(desc(FacilityCount))
# add the percentage column
facilities_count <- sum(countries_facilities_df$FacilityCount)
countries_facilities_df$Percent <- percentage(countries_facilities_df$FacilityCount / facilities_count, 0)
hist(countries_facilities_df$FacilityCount,
xlab = "No. of Launching Facilities",
ylab = "No. of Countries",
labels=TRUE,
xlim = c(0, 20),
ylim = c(0, 20),
main = NULL)
axis(side = 1, at = seq(0,20, 1))
```
> Majority of countries had at most two launching facilities.
### Histogram of no. of launches per facility.{.no-padding}
```{r, fig.width=8}
hist(mission_status_df$Total,
xlab = "No. of Launches",
ylab = "No. of Facilities",
ylim=c(0,50),
labels=TRUE, main = NULL)
```
> `r paste0(percentage(41/nrow(df_facility %>% distinct(Place)),0), '%')` facilities performed no more than 200 launches.
<!--
### Facilities with <= 200 launches {data-width=100}
```{r, results='markup'}
gauge(percentage(41/nrow(df_facility %>% distinct(Place)),0), min = 0, max = 100, symbol = '%'
)
```
-->
Column {data-height=500 data-width=500}
-----------------------------
### Percent of facilities located in a country. {.no-padding .no-title}
```{r, fig.height=5}
library(ggplot2)
ggplot(data=countries_facilities_df, aes(reorder(Area, Percent), Percent, fill=Percent)) +
geom_bar(stat="identity") +
geom_text(aes(label = Percent), vjust=0.5, hjust=-0.3) +
scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
scale_y_continuous(limits= c(0,37)) +
xlab('Facility Location') +
ylab('[%] of Total No. of Facilities') +
coord_flip() +
labs(fill='% Facilities\nLocated\nin Country',
title='Percent of facilities located in a country.')
```
> Rockets were launched off of vessels for Yellow Sea and Barents Sea.
By Location {data-orientation=rows data-navmenu="Launches" data-navmenu-icon="fa-rocket"}
====================================================
Row
-------------------------------------
### Locations/countries w.r.t. the global no. of launches. {.no-padding}
```{r}
# launches per location/country
launches_per_loc_df <- df_facility %>%
group_by(Area) %>%
summarize(Total = n(), Percent = percentage(Total/ total_launches, 1)) %>%
arrange(desc(Total))
launches_per_loc_df %>% select (Area, Percent) %>%
ggplot(aes(reorder(Area, Percent), Percent, fill=Percent)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_text(aes(label = Percent), vjust=0.5, hjust=-0.3) +
scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
xlab('Location/Country') +
ylab('[%] of Total Launches') +
ylim(0,35) +
coord_flip() +
labs(fill='Percent',
#title=paste('Locations/countries w.r.t. the global no. of launches.'),
subtitle=paste('There were', total_launches, 'launches in total.'
))
```
> Top five countries/locations accounted for over 92% of global launches.
### Top 10 launching facilities w.r.t. the number of launches. {.no-padding}
```{r}
mission_status_df %>%
select(PlaceAbbrv, TotalPercent) %>%
arrange(desc(TotalPercent)) %>%
head(10) %>%
ggplot(aes(reorder(PlaceAbbrv, TotalPercent), TotalPercent, fill=TotalPercent)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_text(aes(label = TotalPercent), vjust=0.5, hjust=-0.3) +
scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
xlab('Facility') +
ylab('[%] of Total Launches') +
ylim(0,30) +
coord_flip() +
labs(fill='Percent',
#title='Top 10 facilities w.r.t. total no. of launches.',
subtitle=paste('There were', total_launches, 'launches in total.'))
```
By Year {data-orientation=columns data-navmenu="Launches"}
====================================================
Column {.no-padding data-width=300}
------------------
### Global No. of Launches: `r total_launches` {.no-title data-height=115 .no-padding}
Global No. of Launches: `r total_launches`
### Space mission launch successes and failures. {.no-title data-height=400 .no-padding}
```{r, fig.width=8, fig.align='left'}
# install.packages("lessR")
library(lessR)
counts_c <- totals_df[2:5]
# Color palette
colors <- hcl.colors(length(counts_c), "Blues")
pc(counts_c,
radius = 1,
hole = 0.50,
data = counts_c,
fill=colors,
values_size = 1.55,
labels_cex=1.55,
main_cex=1.55,
main='Space missions launch successes and failures.')
```
```{r}
# how many launches per year
#install.packages("lubridate")
library(lubridate)
# get the year from the date
# for some reason this does not work but it works in a R script
# #years_vec <- format(as.Date(df$Date, format="%Y-%m-%d"),"%Y")
years_vec <- year(mdy(df$Date))
# table() will get you the frequency table
years_df <- as_data_frame(table(years_vec))
# rename the columns names
colnames(years_df) <- c('Year', 'LaunchesCount')
# 'Year' is factor (see class(years_df$Year)), so it contains levels
# we want to convert it to numeric values and it is done via
# converting to a character and numeric
years_df$Year <- as.numeric(as.character(years_df$Year))
# add the percentage relative to the total launches
years_df$Percent <- percentage(years_df$LaunchesCount/total_launches,1)
# when is the maximum number of launches
max_number_of_launches <- years_df %>% filter( years_df$LaunchesCount == max(years_df$LaunchesCount))
# when the minimum number of launches occurred
min_number_of_launches <- years_df %>% filter( years_df$LaunchesCount == min(years_df$LaunchesCount))
# fast method for finding the second lowest;
second_lowest <- sort(years_df$LaunchesCount,partial=2)[2]
second_min <- years_df %>% filter( years_df$LaunchesCount == second_lowest)
```
### Top 10 years with the most launches. {.no-title .no-padding data-height=400}
```{r, fig.width=8}
most_launches_per_year <- years_df %>%
arrange(desc(LaunchesCount), desc(Year)) %>%
head(10)
# without conversion to a character it treats is as numerical values
# and displays accordingingly at the year axis (i.e., with gaps)
most_launches_per_year$Year <- as.character(most_launches_per_year$Year)
most_launches_per_year %>%
ggplot(aes(reorder(Year, Percent), y=Percent, fill=Percent)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_text(aes(label = Percent), vjust=0.5, hjust=-0.2) +
coord_flip() +
#scale_y_continuous(breaks = seq(from=0, to=160, by=10)) +
#scale_x_continuous(breaks = seq(from=0, to=100, by=10))
scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
xlab('Year') +
ylab('[%] of Launches Globally') +
labs(fill='[%] of\nLaunches\nGlobally',
title= 'Top 10 years with the most launches a year.')
```
> In `r max_number_of_launches[1]` occurred the highest no. of launches - `r max_number_of_launches[2]`.
Column {.no-padding data-width=600}
------------------------
### Highest and Lowest Number of Launches per Year {.no-title .no-padding}
```{r, fig.width=10, fig.height=6}
# percentages w.r.t. the global number of launches
years_df %>%
ggplot(aes(x=Year, y=Percent, fill=Percent)) +
geom_bar(position=position_dodge(), stat="identity") +
theme(axis.text.x = element_text(angle=90, vjust=.5, hjust=1, size=6)) +
scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
ylab('[%] of Launches Globally') +
labs(fill='% of\nLaunches\nGlobally',
title='Percentage of launches per year globally.')
# for absolute numbers you can use this
# years_df %>%
# ggplot(aes(x=Year, y=LaunchesCount, fill=LaunchesCount)) +
# geom_bar(position=position_dodge(), stat="identity") +
# theme(axis.text.x = element_text(angle=90, vjust=.5, hjust=1, size=6)) +
# scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
# ylab('No. of Launches') +
# labs(fill='Number of\nLaunches',
# title=paste(
# 'The number of launches per year globally for the period', LAUNCH_PERIOD_DOT))
```
> No. of launches was the highest (~ 100 per year) in 1966-1978, and 2016-2022. Lowest no. of launches per year was performed in 2001-2015 and it ranged from
`r min(years_df[years_df$Year >= 2001 & years_df$Year <= 2015,]$LaunchesCount)`
(`r min(years_df[years_df$Year >= 2001 & years_df$Year <= 2015,]$Percent)`% of global launches) to `r max(years_df[years_df$Year >= 2001 & years_df$Year <= 2015,]$LaunchesCount)` (`r max(years_df[years_df$Year >= 2001 & years_df$Year <= 2015,]$Percent)`% of global launches).
Rockets {data-orientation=columns}
====================================================
Column {data-width=280}
---------------------------
```{r}
# how many and what type of rockets are used
# how many times each rocket was used and what is its status (retired or not)
# how long it served
# first deal with the date and convert it to the right format
# for some reason the code that works fine in the R script
# does not work in a R Markdown chunk so here baby steps
rockets_df <- df %>%
select(Rocket, RocketStatus, Date)
# convert the Date column from character to the Date
rockets_df$Date <- mdy(rockets_df$Date)
# now deal with other stuff related to a date column
# the OperationInYears - is an approximate number (the year is assumed to
# have 365 days)
rockets_df <- rockets_df %>%
group_by(Rocket, RocketStatus) %>%
summarize(LaunchCount = n(), FirstLaunch = min(Date), LastLaunch = max(Date),
OperationInDays = length(seq(from = min(Date), to = max(Date), by = 'day')) - 1,
OperationInYears = round(OperationInDays / 365, digits = 1)) %>%
arrange(desc(LaunchCount), desc(Rocket))
rockets_df$CountPerc <- percentage(rockets_df$LaunchCount / sum(rockets_df$LaunchCount), 1)
# hack to make nice labels in PieChart - I need to add a label
# otherwise I have 0 and 1, so here we go
# add the column with labels
rockets_df$RocketStatusName <- with(rockets_df, ifelse(RocketStatus == 0, "Retired", "Active"))
```
### Rockets: Active vs. Retired {.no-padding .no-title data-height=400}
```{r}
# how many rockets are active vs retired
PieChart(RocketStatusName, data=rockets_df,
fill=c("red","lightblue"),
values_size = 1.5,
values = "%",
labels_cex = 1.5,
cex = 1.5,
hole = 0.4,
main="Active vs. retired rockets as on 7/29/2022."
)
```
### Rockets {.no-padding .no-title data-height=500}
```{r, fig.height=5}
hist(rockets_df$LaunchCount, xlab = "No. of Launches",
ylab = "Rocket Count",
xlim=c(0,500), ylim=c(0,400), labels=TRUE, main = 'Histogram of no. of launches per rocket.'
)
```
> `r percentage(359/nrow(rockets_df), 0)`% rockets launched $\le$ 50 times.
Column {data-width=400}
----------------------------------------
### Rockets Count {.no-title .no-padding data-height=250}
No. of rockets: `r nrow(rockets_df)`
No. of launches: `r total_launches`
### Top 10 Most Popular Rockets {.no-padding .no-title data-height=800}
```{r}
# top 10 most popular rockets
top10_rockets_df <- rockets_df %>%
select(Rocket, LaunchCount, CountPerc, RocketStatusName, FirstLaunch, LastLaunch, OperationInYears) %>%
arrange(desc(LaunchCount)) %>%
head(10)
```
```{r, fig.height=6}
top10_rockets_df %>%
ggplot(aes(reorder(Rocket, CountPerc), y=CountPerc)) +
geom_bar(position=position_dodge(), stat="identity", aes(fill=factor(RocketStatusName), group=1), show.legend = TRUE) +
theme(axis.text.x = element_text(angle=0, vjust=.5, hjust=1)) +
scale_fill_discrete(labels = c("Active", "Retired")) +
#scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
ylab('[%] of Total No. of Launches') +
xlab('Rocket') +
labs(fill="Rocket Status",
title='Top 10 most popular rockets of all times.') +
coord_flip()
```
> Rocket `r top10_rockets_df$Rocket[1]` was the most popular one with
`r top10_rockets_df$LaunchCount[1]` launches (`r top10_rockets_df$CountPerc[1]`% of total launches).
Column {data-width=500}
-------------------------
### Top 10 Longest Serving Rockets {.no-title .no-padding data-height=400}
```{r}
# top 10 rockets serving the longest
top10_longest_rockets_df <- rockets_df %>%
select(Rocket, LaunchCount, CountPerc, RocketStatusName, FirstLaunch, LastLaunch, OperationInYears) %>%
arrange(desc(OperationInYears)) %>%
head(10)
```
```{r, fig.width=10}
hist(rockets_df$OperationInYears, xlab = "Operation in Years",
ylab= "Rocket Count",
xlim=c(0,50), ylim=c(0,250), labels=TRUE, breaks=9,
main = 'Histogram of operational rocket lifespan.')
```
> `r percentage(237/nrow(rockets_df), 0)`% rockets operated for $\le$ 5 years.
### Longest Serving Rockets {.no-title .no-padding data-height=600}
```{r, fig.width=9}
top10_longest_rockets_df %>%
ggplot(aes(reorder(Rocket, OperationInYears), y=OperationInYears)) +
geom_bar(position=position_dodge(), stat="identity", aes(fill=factor(RocketStatusName), group=1), show.legend = TRUE) +
theme(axis.text.x = element_text(angle=0, vjust=.5, hjust=1)) +
scale_fill_discrete(labels = c("Active", "Retired")) +
#scale_fill_continuous(high = MOST_INTENSE_COLOR, low = LEAST_INTENSE_COLOR) +
ylab('Years') +
xlab('Rocket') +
labs(fill="Rocket Status",
title='Rockets with the longest operational lifespan.') +
coord_flip()
```
> `r top10_longest_rockets_df[1,][1]` was operational for `r top10_longest_rockets_df[1,][7]` years (from `r top10_longest_rockets_df[1,][5]` to `r top10_longest_rockets_df[1,][6]`). Longest serving rocket still active has been `r top10_longest_rockets_df[3,][1]` (`r top10_longest_rockets_df[3,][7]` years).
About {data-orientation=rows}
====================================================
Row
---------------------------
### About {.no-title}
#### Data
The data for this study is located at Maven Analytics [https://app.mavenanalytics.io/datasets] "Space Missions" and
contains 4,630 records regarding space missions around the world for the period 10/4/1957 - 07/29/2022.
#### License
I reached out to the staff at Maven Analytics and they said that the dataset can be freely used.
#### Data Credibility
I performed some data credibility check by checking and confirming from other sources some randomly selected observations.
#### Data Processing
- I performed cleaning of the data.
- Some data is missing, especially the price field.
- Data has been slightly reformatted compared to the original data for the purpose of the analysis.
#### Thanks
Thanks for looking at my work.
Comments on how it can be done better are always welcome.